Overview

Dataset statistics

 Original DatasetOversampled Dataset
Number of variables99
Number of observations211995
Missing cells00
Missing cells (%)0.0%0.0%
Duplicate rows0199
Duplicate rows (%)0.0%20.0%
Total size in memory24.6 KiB77.7 KiB
Average record size in memory119.3 B80.0 B

Variable types

 Original DatasetOversampled Dataset
Categorical44
Numeric55

Alerts

Original DatasetOversampled Dataset
time is highly overall correlated with distanceAlert not present in High Correlation
velocity is highly overall correlated with distanceAlert not present in High Correlation
line_width is highly overall correlated with roughnessAlert not present in High Correlation
roughness is highly overall correlated with line_widthAlert not present in High Correlation
distance is highly overall correlated with time and 1 other fieldsAlert not present in High Correlation
ink_visco_cp is highly overall correlated with surface_tension_dyne_cm and 1 other fieldsAlert not present in High Correlation
surface_tension_dyne_cm is highly overall correlated with ink_visco_cp and 1 other fieldsAlert not present in High Correlation
ink _density is highly overall correlated with ink_visco_cp and 1 other fieldsAlert not present in High Correlation
overspray has 8 (3.8%) zeros overspray has 25 (2.5%) zeros Zeros
Alert not present in Dataset has 199 (20.0%) duplicate rowsDuplicates
Alert not present in distance has a high cardinality: 77 distinct values High Cardinality
Alert not present in ink_visco_cp has a high cardinality: 179 distinct values High Cardinality
Alert not present in surface_tension_dyne_cm has a high cardinality: 179 distinct values High Cardinality
Alert not present in ink _density has a high cardinality: 74 distinct values High Cardinality
Alert not present in distance is highly imbalanced (67.5%) Imbalance
Alert not present in ink_visco_cp is highly imbalanced (64.6%) Imbalance
Alert not present in surface_tension_dyne_cm is highly imbalanced (64.6%) Imbalance
Alert not present in ink _density is highly imbalanced (63.7%) Imbalance

Reproduction

 Original DatasetOversampled Dataset
Analysis started2023-04-13 11:53:06.8075702023-04-13 11:53:10.837813
Analysis finished2023-04-13 11:53:10.8195332023-04-13 11:53:14.947898
Duration4.01 seconds4.11 seconds
Software versionydata-profiling vv4.1.2ydata-profiling vv4.1.2
Download configurationconfig.jsonconfig.json

Variables

distance
Categorical

 Original DatasetOversampled Dataset
Distinct377
Distinct (%)1.4%7.7%
Missing00
Missing (%)0.0%0.0%
Memory size11.4 KiB15.5 KiB
900
160 
300
49 
270
 
2
900
682 
300
175 
270
 
7
296
 
6
903
 
4
Other values (72)
121 

Length

 Original DatasetOversampled Dataset
Max length33
Median length33
Mean length33
Min length33

Characters and Unicode

 Original DatasetOversampled Dataset
Total characters6332985
Distinct characters510
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Original DatasetOversampled Dataset
Unique041 ?
Unique (%)0.0%4.1%

Sample

 Original DatasetOversampled Dataset
1st row270900
2nd row270900
3rd row300900
4th row300900
5th row300900

Common Values

ValueCountFrequency (%)
900 160
75.8%
300 49
 
23.2%
270 2
 
0.9%
ValueCountFrequency (%)
900 682
68.5%
300 175
 
17.6%
270 7
 
0.7%
296 6
 
0.6%
903 4
 
0.4%
891 4
 
0.4%
289 4
 
0.4%
896 4
 
0.4%
908 3
 
0.3%
292 3
 
0.3%
Other values (67) 103
 
10.4%

Length

2023-04-13T05:53:15.039613image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Original Dataset

2023-04-13T05:53:15.191486image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Oversampled Dataset


Number of variable categories passes threshold (config.plot.cat_freq.max_unique)
ValueCountFrequency (%)
900 160
75.8%
300 49
 
23.2%
270 2
 
0.9%
ValueCountFrequency (%)
900 682
68.5%
300 175
 
17.6%
270 7
 
0.7%
296 6
 
0.6%
903 4
 
0.4%
891 4
 
0.4%
289 4
 
0.4%
896 4
 
0.4%
293 3
 
0.3%
285 3
 
0.3%
Other values (67) 103
 
10.4%

Most occurring characters

ValueCountFrequency (%)
0 420
66.4%
9 160
 
25.3%
3 49
 
7.7%
2 2
 
0.3%
7 2
 
0.3%
ValueCountFrequency (%)
0 1746
58.5%
9 772
25.9%
3 198
 
6.6%
8 79
 
2.6%
2 74
 
2.5%
7 37
 
1.2%
1 27
 
0.9%
6 18
 
0.6%
5 17
 
0.6%
4 17
 
0.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 633
100.0%
ValueCountFrequency (%)
Decimal Number 2985
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 420
66.4%
9 160
 
25.3%
3 49
 
7.7%
2 2
 
0.3%
7 2
 
0.3%
ValueCountFrequency (%)
0 1746
58.5%
9 772
25.9%
3 198
 
6.6%
8 79
 
2.6%
2 74
 
2.5%
7 37
 
1.2%
1 27
 
0.9%
6 18
 
0.6%
5 17
 
0.6%
4 17
 
0.6%

Most occurring scripts

ValueCountFrequency (%)
Common 633
100.0%
ValueCountFrequency (%)
Common 2985
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 420
66.4%
9 160
 
25.3%
3 49
 
7.7%
2 2
 
0.3%
7 2
 
0.3%
ValueCountFrequency (%)
0 1746
58.5%
9 772
25.9%
3 198
 
6.6%
8 79
 
2.6%
2 74
 
2.5%
7 37
 
1.2%
1 27
 
0.9%
6 18
 
0.6%
5 17
 
0.6%
4 17
 
0.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 633
100.0%
ValueCountFrequency (%)
ASCII 2985
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 420
66.4%
9 160
 
25.3%
3 49
 
7.7%
2 2
 
0.3%
7 2
 
0.3%
ValueCountFrequency (%)
0 1746
58.5%
9 772
25.9%
3 198
 
6.6%
8 79
 
2.6%
2 74
 
2.5%
7 37
 
1.2%
1 27
 
0.9%
6 18
 
0.6%
5 17
 
0.6%
4 17
 
0.6%

time
Real number (ℝ)

 Original DatasetOversampled Dataset
Distinct66487
Distinct (%)31.3%48.9%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean71.09478769.952552
 Original DatasetOversampled Dataset
Minimum3131
Maximum130130
Zeros00
Zeros (%)0.0%0.0%
Negative00
Negative (%)0.0%0.0%
Memory size11.4 KiB15.5 KiB
2023-04-13T05:53:15.335112image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

 Original DatasetOversampled Dataset
Minimum3131
5-th percentile34.534
Q158.559
median6969
Q38785
95-th percentile110107.28101
Maximum130130
Range9999
Interquartile range (IQR)28.526

Descriptive statistics

 Original DatasetOversampled Dataset
Standard deviation23.71578822.709692
Coefficient of variation (CV)0.333579840.32464423
Kurtosis-0.66289849-0.60357679
Mean71.09478769.952552
Median Absolute Deviation (MAD)1815
Skewness0.186825820.072098422
Sum1500169602.79
Variance562.43859515.73012
MonotonicityNot monotonicNot monotonic
2023-04-13T05:53:15.597568image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
44 10
 
4.7%
78 10
 
4.7%
66 10
 
4.7%
38 8
 
3.8%
96 8
 
3.8%
63 7
 
3.3%
61 7
 
3.3%
107 6
 
2.8%
64 6
 
2.8%
34 5
 
2.4%
Other values (56) 134
63.5%
ValueCountFrequency (%)
78 31
 
3.1%
66 26
 
2.6%
61 26
 
2.6%
44 25
 
2.5%
38 22
 
2.2%
63 19
 
1.9%
96 19
 
1.9%
34 18
 
1.8%
107 16
 
1.6%
64 15
 
1.5%
Other values (477) 778
78.2%
ValueCountFrequency (%)
31 2
 
0.9%
32 4
1.9%
34 5
2.4%
35 1
 
0.5%
36 2
 
0.9%
37 2
 
0.9%
38 8
3.8%
39 1
 
0.5%
40 4
1.9%
41 2
 
0.9%
ValueCountFrequency (%)
31 5
0.5%
31.02710795 1
 
0.1%
31.26988438 1
 
0.1%
31.37545946 1
 
0.1%
31.62190161 1
 
0.1%
31.67947121 1
 
0.1%
31.85518675 1
 
0.1%
31.99237295 1
 
0.1%
32 11
1.1%
32.15991961 1
 
0.1%
ValueCountFrequency (%)
31 5
2.4%
31.02710795 1
 
0.5%
31.26988438 1
 
0.5%
31.37545946 1
 
0.5%
31.62190161 1
 
0.5%
31.67947121 1
 
0.5%
31.85518675 1
 
0.5%
31.99237295 1
 
0.5%
32 11
5.2%
32.15991961 1
 
0.5%
ValueCountFrequency (%)
31 2
 
0.2%
32 4
0.4%
34 5
0.5%
35 1
 
0.1%
36 2
 
0.2%
37 2
 
0.2%
38 8
0.8%
39 1
 
0.1%
40 4
0.4%
41 2
 
0.2%

velocity
Real number (ℝ)

 Original DatasetOversampled Dataset
Distinct77504
Distinct (%)36.5%50.7%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean10.63285210.78691
 Original DatasetOversampled Dataset
Minimum6.6676.5102958
Maximum15.51724115.517241
Zeros00
Zeros (%)0.0%0.0%
Negative00
Negative (%)0.0%0.0%
Memory size11.4 KiB15.5 KiB
2023-04-13T05:53:15.795012image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

 Original DatasetOversampled Dataset
Minimum6.6676.5102958
5-th percentile6.8186.8679363
Q18.33333338.4530805
median10.34510.737438
Q313.04313.065442
95-th percentile14.87704914.754
Maximum15.51724115.517241
Range8.85024149.0069456
Interquartile range (IQR)4.70966674.6123611

Descriptive statistics

 Original DatasetOversampled Dataset
Standard deviation2.64456342.5054963
Coefficient of variation (CV)0.248716290.23227192
Kurtosis-1.2476413-1.2319851
Mean10.63285210.78691
Median Absolute Deviation (MAD)2.3312.3055619
Skewness0.182832750.060440436
Sum2243.531710732.975
Variance6.99371546.2775116
MonotonicityNot monotonicNot monotonic
2023-04-13T05:53:15.954423image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
9.375 12
 
5.7%
6.818 10
 
4.7%
11.538 10
 
4.7%
13.636 7
 
3.3%
7.895 6
 
2.8%
14.754 6
 
2.8%
6.667 5
 
2.4%
12.329 5
 
2.4%
15 5
 
2.4%
8.411214953 5
 
2.4%
Other values (67) 140
66.4%
ValueCountFrequency (%)
11.538 31
 
3.1%
9.375 30
 
3.0%
6.818 25
 
2.5%
14.754 23
 
2.3%
13.636 17
 
1.7%
8.411214953 15
 
1.5%
13.043 15
 
1.5%
7.895 15
 
1.5%
12.329 14
 
1.4%
6.667 13
 
1.3%
Other values (494) 797
80.1%
ValueCountFrequency (%)
6.667 5
2.4%
6.818 10
4.7%
6.923 2
 
0.9%
6.976744186 1
 
0.5%
6.977 2
 
0.9%
7.142857143 1
 
0.5%
7.143 2
 
0.9%
7.317 1
 
0.5%
7.317073171 1
 
0.5%
7.5 4
 
1.9%
ValueCountFrequency (%)
6.510295828 1
 
0.1%
6.64540761 1
 
0.1%
6.667 13
1.3%
6.670616053 1
 
0.1%
6.691182871 1
 
0.1%
6.706289042 1
 
0.1%
6.724935982 1
 
0.1%
6.744845747 1
 
0.1%
6.78493778 1
 
0.1%
6.788839617 1
 
0.1%
ValueCountFrequency (%)
6.510295828 1
 
0.5%
6.64540761 1
 
0.5%
6.667 13
6.2%
6.670616053 1
 
0.5%
6.691182871 1
 
0.5%
6.706289042 1
 
0.5%
6.724935982 1
 
0.5%
6.744845747 1
 
0.5%
6.78493778 1
 
0.5%
6.788839617 1
 
0.5%
ValueCountFrequency (%)
6.667 5
0.5%
6.818 10
1.0%
6.923 2
 
0.2%
6.976744186 1
 
0.1%
6.977 2
 
0.2%
7.142857143 1
 
0.1%
7.143 2
 
0.2%
7.317 1
 
0.1%
7.317073171 1
 
0.1%
7.5 4
 
0.4%

ink_visco_cp
Categorical

 Original DatasetOversampled Dataset
Distinct2179
Distinct (%)0.9%18.0%
Missing00
Missing (%)0.0%0.0%
Memory size11.4 KiB15.5 KiB
6.9
160 
6.3
51 
6.9
625 
6.3
193 
6.839189173500493
 
1
6.8893453954216
 
1
6.891526133923265
 
1
Other values (174)
174 

Length

 Original DatasetOversampled Dataset
Max length318
Median length33
Mean length35.4884422
Min length33

Characters and Unicode

 Original DatasetOversampled Dataset
Total characters6335461
Distinct characters411
Distinct categories22 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Original DatasetOversampled Dataset
Unique0177 ?
Unique (%)0.0%17.8%

Sample

 Original DatasetOversampled Dataset
1st row6.36.9
2nd row6.36.9
3rd row6.36.9
4th row6.36.9
5th row6.36.9

Common Values

ValueCountFrequency (%)
6.9 160
75.8%
6.3 51
 
24.2%
ValueCountFrequency (%)
6.9 625
62.8%
6.3 193
 
19.4%
6.839189173500493 1
 
0.1%
6.8893453954216 1
 
0.1%
6.891526133923265 1
 
0.1%
6.484134596369307 1
 
0.1%
6.904118415801134 1
 
0.1%
6.713278084029549 1
 
0.1%
6.8957135588350456 1
 
0.1%
6.550383574072519 1
 
0.1%
Other values (169) 169
 
17.0%

Length

2023-04-13T05:53:16.162121image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Original Dataset

2023-04-13T05:53:16.365226image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Oversampled Dataset


Number of variable categories passes threshold (config.plot.cat_freq.max_unique)
ValueCountFrequency (%)
6.9 160
75.8%
6.3 51
 
24.2%
ValueCountFrequency (%)
6.9 625
62.8%
6.3 193
 
19.4%
6.907044739318165 1
 
0.1%
6.932477700523756 1
 
0.1%
6.260091748057529 1
 
0.1%
6.279048619135105 1
 
0.1%
6.876993977114789 1
 
0.1%
6.90381849216281 1
 
0.1%
6.897399167525844 1
 
0.1%
6.924797493021141 1
 
0.1%
Other values (169) 169
 
17.0%

Most occurring characters

ValueCountFrequency (%)
6 211
33.3%
. 211
33.3%
9 160
25.3%
3 51
 
8.1%
ValueCountFrequency (%)
6 1221
22.4%
. 995
18.2%
9 920
16.8%
3 484
 
8.9%
8 296
 
5.4%
1 269
 
4.9%
2 262
 
4.8%
4 260
 
4.8%
5 255
 
4.7%
7 253
 
4.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 422
66.7%
Other Punctuation 211
33.3%
ValueCountFrequency (%)
Decimal Number 4466
81.8%
Other Punctuation 995
 
18.2%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
6 211
50.0%
9 160
37.9%
3 51
 
12.1%
ValueCountFrequency (%)
6 1221
27.3%
9 920
20.6%
3 484
 
10.8%
8 296
 
6.6%
1 269
 
6.0%
2 262
 
5.9%
4 260
 
5.8%
5 255
 
5.7%
7 253
 
5.7%
0 246
 
5.5%
Other Punctuation
ValueCountFrequency (%)
. 211
100.0%
ValueCountFrequency (%)
. 995
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 633
100.0%
ValueCountFrequency (%)
Common 5461
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
6 211
33.3%
. 211
33.3%
9 160
25.3%
3 51
 
8.1%
ValueCountFrequency (%)
6 1221
22.4%
. 995
18.2%
9 920
16.8%
3 484
 
8.9%
8 296
 
5.4%
1 269
 
4.9%
2 262
 
4.8%
4 260
 
4.8%
5 255
 
4.7%
7 253
 
4.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 633
100.0%
ValueCountFrequency (%)
ASCII 5461
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
6 211
33.3%
. 211
33.3%
9 160
25.3%
3 51
 
8.1%
ValueCountFrequency (%)
6 1221
22.4%
. 995
18.2%
9 920
16.8%
3 484
 
8.9%
8 296
 
5.4%
1 269
 
4.9%
2 262
 
4.8%
4 260
 
4.8%
5 255
 
4.7%
7 253
 
4.6%
 Original DatasetOversampled Dataset
Distinct2179
Distinct (%)0.9%18.0%
Missing00
Missing (%)0.0%0.0%
Memory size11.4 KiB15.5 KiB
32.3
160 
30.9
51 
32.3
625 
30.9
193 
32.15810807150115
 
1
32.253758967508595
 
1
32.32647772730582
 
1
Other values (174)
174 

Length

 Original DatasetOversampled Dataset
Max length418
Median length44
Mean length46.3839196
Min length44

Characters and Unicode

 Original DatasetOversampled Dataset
Total characters8446352
Distinct characters511
Distinct categories22 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Original DatasetOversampled Dataset
Unique0177 ?
Unique (%)0.0%17.8%

Sample

 Original DatasetOversampled Dataset
1st row30.932.3
2nd row30.932.3
3rd row30.932.3
4th row30.932.3
5th row30.932.3

Common Values

ValueCountFrequency (%)
32.3 160
75.8%
30.9 51
 
24.2%
ValueCountFrequency (%)
32.3 625
62.8%
30.9 193
 
19.4%
32.15810807150115 1
 
0.1%
32.253758967508595 1
 
0.1%
32.32647772730582 1
 
0.1%
31.32964739152838 1
 
0.1%
32.31091674299361 1
 
0.1%
31.86431552940228 1
 
0.1%
32.26350845706199 1
 
0.1%
31.484228339502543 1
 
0.1%
Other values (169) 169
 
17.0%

Length

2023-04-13T05:53:16.529288image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Original Dataset

2023-04-13T05:53:16.673149image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Oversampled Dataset


Number of variable categories passes threshold (config.plot.cat_freq.max_unique)
ValueCountFrequency (%)
32.3 160
75.8%
30.9 51
 
24.2%
ValueCountFrequency (%)
32.3 625
62.8%
30.9 193
 
19.4%
32.36619546806442 1
 
0.1%
32.22766307477402 1
 
0.1%
30.91146251152118 1
 
0.1%
30.911189380029512 1
 
0.1%
32.30851076097337 1
 
0.1%
32.27274201636798 1
 
0.1%
32.3026181025658 1
 
0.1%
32.308301024451346 1
 
0.1%
Other values (169) 169
 
17.0%

Most occurring characters

ValueCountFrequency (%)
3 371
44.0%
. 211
25.0%
2 160
19.0%
0 51
 
6.0%
9 51
 
6.0%
ValueCountFrequency (%)
3 1906
30.0%
2 1005
15.8%
. 995
15.7%
0 469
 
7.4%
9 448
 
7.1%
1 278
 
4.4%
8 278
 
4.4%
6 256
 
4.0%
5 245
 
3.9%
7 238
 
3.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 633
75.0%
Other Punctuation 211
 
25.0%
ValueCountFrequency (%)
Decimal Number 5357
84.3%
Other Punctuation 995
 
15.7%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
3 371
58.6%
2 160
25.3%
0 51
 
8.1%
9 51
 
8.1%
ValueCountFrequency (%)
3 1906
35.6%
2 1005
18.8%
0 469
 
8.8%
9 448
 
8.4%
1 278
 
5.2%
8 278
 
5.2%
6 256
 
4.8%
5 245
 
4.6%
7 238
 
4.4%
4 234
 
4.4%
Other Punctuation
ValueCountFrequency (%)
. 211
100.0%
ValueCountFrequency (%)
. 995
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 844
100.0%
ValueCountFrequency (%)
Common 6352
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
3 371
44.0%
. 211
25.0%
2 160
19.0%
0 51
 
6.0%
9 51
 
6.0%
ValueCountFrequency (%)
3 1906
30.0%
2 1005
15.8%
. 995
15.7%
0 469
 
7.4%
9 448
 
7.1%
1 278
 
4.4%
8 278
 
4.4%
6 256
 
4.0%
5 245
 
3.9%
7 238
 
3.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 844
100.0%
ValueCountFrequency (%)
ASCII 6352
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
3 371
44.0%
. 211
25.0%
2 160
19.0%
0 51
 
6.0%
9 51
 
6.0%
ValueCountFrequency (%)
3 1906
30.0%
2 1005
15.8%
. 995
15.7%
0 469
 
7.4%
9 448
 
7.1%
1 278
 
4.4%
8 278
 
4.4%
6 256
 
4.0%
5 245
 
3.9%
7 238
 
3.7%

ink _density
Categorical

 Original DatasetOversampled Dataset
Distinct274
Distinct (%)0.9%7.4%
Missing00
Missing (%)0.0%0.0%
Memory size11.4 KiB15.5 KiB
1614
160 
1517
51 
1614
635 
1517
194 
1613
 
13
1612
 
12
1611
 
9
Other values (69)
132 

Length

 Original DatasetOversampled Dataset
Max length44
Median length44
Mean length44
Min length44

Characters and Unicode

 Original DatasetOversampled Dataset
Total characters8443980
Distinct characters510
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Original DatasetOversampled Dataset
Unique036 ?
Unique (%)0.0%3.6%

Sample

 Original DatasetOversampled Dataset
1st row15171614
2nd row15171614
3rd row15171614
4th row15171614
5th row15171614

Common Values

ValueCountFrequency (%)
1614 160
75.8%
1517 51
 
24.2%
ValueCountFrequency (%)
1614 635
63.8%
1517 194
 
19.5%
1613 13
 
1.3%
1612 12
 
1.2%
1611 9
 
0.9%
1518 7
 
0.7%
1615 6
 
0.6%
1609 5
 
0.5%
1616 5
 
0.5%
1519 4
 
0.4%
Other values (64) 105
 
10.6%

Length

2023-04-13T05:53:16.853975image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Original Dataset

2023-04-13T05:53:17.020571image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Oversampled Dataset


Number of variable categories passes threshold (config.plot.cat_freq.max_unique)
ValueCountFrequency (%)
1614 160
75.8%
1517 51
 
24.2%
ValueCountFrequency (%)
1614 635
63.8%
1517 194
 
19.5%
1613 13
 
1.3%
1612 12
 
1.2%
1611 9
 
0.9%
1518 7
 
0.7%
1615 6
 
0.6%
1609 5
 
0.5%
1616 5
 
0.5%
1519 4
 
0.4%
Other values (64) 105
 
10.6%

Most occurring characters

ValueCountFrequency (%)
1 422
50.0%
6 160
 
19.0%
4 160
 
19.0%
5 51
 
6.0%
7 51
 
6.0%
ValueCountFrequency (%)
1 1920
48.2%
6 730
 
18.3%
4 649
 
16.3%
5 310
 
7.8%
7 211
 
5.3%
2 36
 
0.9%
8 34
 
0.9%
3 32
 
0.8%
0 29
 
0.7%
9 29
 
0.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 844
100.0%
ValueCountFrequency (%)
Decimal Number 3980
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 422
50.0%
6 160
 
19.0%
4 160
 
19.0%
5 51
 
6.0%
7 51
 
6.0%
ValueCountFrequency (%)
1 1920
48.2%
6 730
 
18.3%
4 649
 
16.3%
5 310
 
7.8%
7 211
 
5.3%
2 36
 
0.9%
8 34
 
0.9%
3 32
 
0.8%
0 29
 
0.7%
9 29
 
0.7%

Most occurring scripts

ValueCountFrequency (%)
Common 844
100.0%
ValueCountFrequency (%)
Common 3980
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1 422
50.0%
6 160
 
19.0%
4 160
 
19.0%
5 51
 
6.0%
7 51
 
6.0%
ValueCountFrequency (%)
1 1920
48.2%
6 730
 
18.3%
4 649
 
16.3%
5 310
 
7.8%
7 211
 
5.3%
2 36
 
0.9%
8 34
 
0.9%
3 32
 
0.8%
0 29
 
0.7%
9 29
 
0.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 844
100.0%
ValueCountFrequency (%)
ASCII 3980
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 422
50.0%
6 160
 
19.0%
4 160
 
19.0%
5 51
 
6.0%
7 51
 
6.0%
ValueCountFrequency (%)
1 1920
48.2%
6 730
 
18.3%
4 649
 
16.3%
5 310
 
7.8%
7 211
 
5.3%
2 36
 
0.9%
8 34
 
0.9%
3 32
 
0.8%
0 29
 
0.7%
9 29
 
0.7%

line_width
Real number (ℝ)

 Original DatasetOversampled Dataset
Distinct109170
Distinct (%)51.7%17.1%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean231.06635252.87839
 Original DatasetOversampled Dataset
Minimum112112
Maximum391457
Zeros00
Zeros (%)0.0%0.0%
Negative00
Negative (%)0.0%0.0%
Memory size11.4 KiB15.5 KiB
2023-04-13T05:53:17.240446image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

 Original DatasetOversampled Dataset
Minimum112112
5-th percentile179.5183
Q1196212
median224253
Q3260293
95-th percentile305.5322.3
Maximum391457
Range279345
Interquartile range (IQR)6481

Descriptive statistics

 Original DatasetOversampled Dataset
Standard deviation43.18063248.828389
Coefficient of variation (CV)0.186875470.1930904
Kurtosis0.32551001-0.19361054
Mean231.06635252.87839
Median Absolute Deviation (MAD)3141
Skewness0.533504260.22424905
Sum48755251614
Variance1864.5672384.2116
MonotonicityNot monotonicNot monotonic
2023-04-13T05:53:17.500581image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
191 6
 
2.8%
189 4
 
1.9%
183 4
 
1.9%
238 4
 
1.9%
207 4
 
1.9%
203 4
 
1.9%
224 4
 
1.9%
194 4
 
1.9%
193 4
 
1.9%
232 4
 
1.9%
Other values (99) 169
80.1%
ValueCountFrequency (%)
303 27
 
2.7%
288 24
 
2.4%
191 17
 
1.7%
232 17
 
1.7%
245 16
 
1.6%
238 15
 
1.5%
298 14
 
1.4%
224 14
 
1.4%
287 14
 
1.4%
259 14
 
1.4%
Other values (160) 823
82.7%
ValueCountFrequency (%)
112 1
 
0.5%
123 1
 
0.5%
142 1
 
0.5%
163 1
 
0.5%
167 1
 
0.5%
176 1
 
0.5%
177 1
 
0.5%
178 1
 
0.5%
179 3
1.4%
180 2
0.9%
ValueCountFrequency (%)
112 1
 
0.1%
123 3
0.3%
142 3
0.3%
163 3
0.3%
167 3
0.3%
176 3
0.3%
177 3
0.3%
178 3
0.3%
179 7
0.7%
180 6
0.6%
ValueCountFrequency (%)
112 1
 
0.5%
123 3
1.4%
142 3
1.4%
163 3
1.4%
167 3
1.4%
176 3
1.4%
177 3
1.4%
178 3
1.4%
179 7
3.3%
180 6
2.8%
ValueCountFrequency (%)
112 1
 
0.1%
123 1
 
0.1%
142 1
 
0.1%
163 1
 
0.1%
167 1
 
0.1%
176 1
 
0.1%
177 1
 
0.1%
178 1
 
0.1%
179 3
0.3%
180 2
0.2%

overspray
Real number (ℝ)

 Original DatasetOversampled Dataset
Distinct140351
Distinct (%)66.4%35.3%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean248.99526399.72462
 Original DatasetOversampled Dataset
Minimum00
Maximum47624812
Zeros825
Zeros (%)3.8%2.5%
Negative00
Negative (%)0.0%0.0%
Memory size11.4 KiB15.5 KiB
2023-04-13T05:53:17.780591image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

 Original DatasetOversampled Dataset
Minimum00
5-th percentile1.53
Q123.538.5
median79121
Q3227.5333.5
95-th percentile10291710.5
Maximum47624812
Range47624812
Interquartile range (IQR)204295

Descriptive statistics

 Original DatasetOversampled Dataset
Standard deviation563.70434816.9087
Coefficient of variation (CV)2.2639162.0436787
Kurtosis32.59936514.450975
Mean248.99526399.72462
Median Absolute Deviation (MAD)71102
Skewness5.20913583.7336719
Sum52538397726
Variance317762.59667339.83
MonotonicityNot monotonicNot monotonic
2023-04-13T05:53:18.060462image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 8
 
3.8%
10 5
 
2.4%
91 5
 
2.4%
3 5
 
2.4%
220 4
 
1.9%
7 4
 
1.9%
32 4
 
1.9%
47 4
 
1.9%
24 3
 
1.4%
5 3
 
1.4%
Other values (130) 166
78.7%
ValueCountFrequency (%)
0 25
 
2.5%
10 15
 
1.5%
32 13
 
1.3%
47 13
 
1.3%
201 13
 
1.3%
7 12
 
1.2%
3 11
 
1.1%
91 11
 
1.1%
107 10
 
1.0%
220 10
 
1.0%
Other values (341) 862
86.6%
ValueCountFrequency (%)
0 8
3.8%
1 3
 
1.4%
2 3
 
1.4%
3 5
2.4%
4 1
 
0.5%
5 3
 
1.4%
6 1
 
0.5%
7 4
1.9%
8 2
 
0.9%
9 1
 
0.5%
ValueCountFrequency (%)
0 25
2.5%
1 9
 
0.9%
2 7
 
0.7%
3 11
1.1%
4 2
 
0.2%
5 8
 
0.8%
6 3
 
0.3%
7 12
1.2%
8 6
 
0.6%
9 5
 
0.5%
ValueCountFrequency (%)
0 25
11.8%
1 9
 
4.3%
2 7
 
3.3%
3 11
5.2%
4 2
 
0.9%
5 8
 
3.8%
6 3
 
1.4%
7 12
5.7%
8 6
 
2.8%
9 5
 
2.4%
ValueCountFrequency (%)
0 8
0.8%
1 3
 
0.3%
2 3
 
0.3%
3 5
0.5%
4 1
 
0.1%
5 3
 
0.3%
6 1
 
0.1%
7 4
0.4%
8 2
 
0.2%
9 1
 
0.1%

roughness
Real number (ℝ)

 Original DatasetOversampled Dataset
Distinct99133
Distinct (%)46.9%13.4%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean99.218009114.16784
 Original DatasetOversampled Dataset
Minimum4343
Maximum192228
Zeros00
Zeros (%)0.0%0.0%
Negative00
Negative (%)0.0%0.0%
Memory size11.4 KiB15.5 KiB
2023-04-13T05:53:18.324530image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

 Original DatasetOversampled Dataset
Minimum4343
5-th percentile58.563
Q17584
median93116
Q3120144
95-th percentile152.5167
Maximum192228
Range149185
Interquartile range (IQR)4560

Descriptive statistics

 Original DatasetOversampled Dataset
Standard deviation30.91000535.17643
Coefficient of variation (CV)0.311536240.30811155
Kurtosis-0.12808744-0.67833078
Mean99.218009114.16784
Median Absolute Deviation (MAD)2130
Skewness0.634915660.13863537
Sum20935113597
Variance955.428441237.3813
MonotonicityNot monotonicNot monotonic
2023-04-13T05:53:18.563188image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
68 8
 
3.8%
77 8
 
3.8%
84 7
 
3.3%
73 6
 
2.8%
99 6
 
2.8%
75 5
 
2.4%
85 5
 
2.4%
108 5
 
2.4%
72 5
 
2.4%
91 4
 
1.9%
Other values (89) 152
72.0%
ValueCountFrequency (%)
142 23
 
2.3%
68 21
 
2.1%
77 21
 
2.1%
99 20
 
2.0%
84 20
 
2.0%
145 18
 
1.8%
152 18
 
1.8%
131 17
 
1.7%
144 17
 
1.7%
147 17
 
1.7%
Other values (123) 803
80.7%
ValueCountFrequency (%)
43 1
0.5%
44 1
0.5%
45 1
0.5%
48 2
0.9%
49 2
0.9%
54 1
0.5%
57 1
0.5%
58 2
0.9%
59 1
0.5%
60 1
0.5%
ValueCountFrequency (%)
43 3
0.3%
44 4
0.4%
45 4
0.4%
47 1
 
0.1%
48 4
0.4%
49 5
0.5%
52 1
 
0.1%
54 3
0.3%
56 1
 
0.1%
57 2
 
0.2%
ValueCountFrequency (%)
43 3
1.4%
44 4
1.9%
45 4
1.9%
47 1
 
0.5%
48 4
1.9%
49 5
2.4%
52 1
 
0.5%
54 3
1.4%
56 1
 
0.5%
57 2
 
0.9%
ValueCountFrequency (%)
43 1
0.1%
44 1
0.1%
45 1
0.1%
48 2
0.2%
49 2
0.2%
54 1
0.1%
57 1
0.1%
58 2
0.2%
59 1
0.1%
60 1
0.1%

Interactions

Original Dataset

2023-04-13T05:53:09.973315image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Oversampled Dataset

2023-04-13T05:53:13.736922image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Original Dataset

2023-04-13T05:53:07.257421image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Oversampled Dataset

2023-04-13T05:53:10.982773image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Original Dataset

2023-04-13T05:53:07.861234image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Oversampled Dataset

2023-04-13T05:53:11.579016image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Original Dataset

2023-04-13T05:53:08.535509image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Oversampled Dataset

2023-04-13T05:53:12.127532image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Original Dataset

2023-04-13T05:53:09.292714image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Oversampled Dataset

2023-04-13T05:53:13.114985image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Original Dataset

2023-04-13T05:53:10.077882image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Oversampled Dataset

2023-04-13T05:53:13.885490image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Original Dataset

2023-04-13T05:53:07.391174image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Oversampled Dataset

2023-04-13T05:53:11.115589image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Original Dataset

2023-04-13T05:53:08.011918image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Oversampled Dataset

2023-04-13T05:53:11.683678image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Original Dataset

2023-04-13T05:53:08.661201image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Oversampled Dataset

2023-04-13T05:53:12.240820image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Original Dataset

2023-04-13T05:53:09.421674image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Oversampled Dataset

2023-04-13T05:53:13.222044image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Original Dataset

2023-04-13T05:53:10.202462image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Oversampled Dataset

2023-04-13T05:53:14.018128image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Original Dataset

2023-04-13T05:53:07.471045image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Oversampled Dataset

2023-04-13T05:53:11.237835image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Original Dataset

2023-04-13T05:53:08.159763image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Oversampled Dataset

2023-04-13T05:53:11.782161image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Original Dataset

2023-04-13T05:53:08.768895image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Oversampled Dataset

2023-04-13T05:53:12.353121image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Original Dataset

2023-04-13T05:53:09.578115image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Oversampled Dataset

2023-04-13T05:53:13.370147image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Original Dataset

2023-04-13T05:53:10.343067image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Oversampled Dataset

2023-04-13T05:53:14.192787image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Original Dataset

2023-04-13T05:53:07.568885image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Oversampled Dataset

2023-04-13T05:53:11.352372image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Original Dataset

2023-04-13T05:53:08.298153image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Oversampled Dataset

2023-04-13T05:53:11.900279image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Original Dataset

2023-04-13T05:53:08.917766image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Oversampled Dataset

2023-04-13T05:53:12.859986image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Original Dataset

2023-04-13T05:53:09.742665image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Oversampled Dataset

2023-04-13T05:53:13.493771image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Original Dataset

2023-04-13T05:53:10.478496image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Oversampled Dataset

2023-04-13T05:53:14.359438image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Original Dataset

2023-04-13T05:53:07.703219image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Oversampled Dataset

2023-04-13T05:53:11.464572image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Original Dataset

2023-04-13T05:53:08.421992image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Oversampled Dataset

2023-04-13T05:53:12.012078image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Original Dataset

2023-04-13T05:53:09.091271image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Oversampled Dataset

2023-04-13T05:53:12.993825image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Original Dataset

2023-04-13T05:53:09.852905image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Oversampled Dataset

2023-04-13T05:53:13.603246image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Correlations

2023-04-13T05:53:18.745316image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
timevelocityline_widthoversprayroughnessdistanceink_visco_cpsurface_tension_dyne_cmink _density
time1.000-0.011-0.030-0.057-0.1120.6910.2470.2470.247
velocity-0.0111.0000.2750.1480.1060.5000.2750.2750.275
line_width-0.0300.2751.0000.3080.6100.0000.0000.0000.000
overspray-0.0570.1480.3081.0000.2380.0000.0000.0000.000
roughness-0.1120.1060.6100.2381.0000.1630.1970.1970.197
distance0.6910.5000.0000.0000.1631.0000.1490.1490.149
ink_visco_cp0.2470.2750.0000.0000.1970.1491.0000.9870.987
surface_tension_dyne_cm0.2470.2750.0000.0000.1970.1490.9871.0000.987
ink _density0.2470.2750.0000.0000.1970.1490.9870.9871.000

Missing values

Original Dataset

2023-04-13T05:53:10.620261image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
A simple visualization of nullity by column.

Oversampled Dataset

2023-04-13T05:53:14.608803image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
A simple visualization of nullity by column.

Original Dataset

2023-04-13T05:53:10.760338image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Oversampled Dataset

2023-04-13T05:53:14.839937image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

Original Dataset

distancetimevelocityink_visco_cpsurface_tension_dyne_cmink _densityline_widthoversprayroughness
027034.07.9416.330.9151729412164
127034.07.9416.330.91517261136141
230038.07.8956.330.9151721811103
330044.06.8186.330.915171901568
430041.07.3176.330.915171909190
530040.07.5006.932.31614180062
630038.07.8956.932.316141788082
730043.06.9776.330.9151718524145
830043.06.9776.330.9151721350161
930034.08.8246.330.915173238171

Oversampled Dataset

distancetimevelocityink_visco_cpsurface_tension_dyne_cmink _densityline_widthoversprayroughness
090088.40324210.3831726.932.3161428771118
190087.53308610.5572826.932.3161428710691
2900103.3764958.7923146.932.31614290120134
390068.47628813.5406416.932.31614290106120
490085.07015411.2829726.932.31614294104133
590098.2249429.1956216.932.3161429582131
690088.43737010.2430706.932.3161428859111
790085.04701510.6637006.932.3161428854106
890088.58212210.2251116.932.3161428861121
990080.54601411.5902296.932.3161428781118

Original Dataset

distancetimevelocityink_visco_cpsurface_tension_dyne_cmink _densityline_widthoversprayroughness
201900108.08.3333336.932.3161421228272
20290093.09.6774196.932.3161432347157
20390093.09.6774196.932.31614305201108
20490094.09.5744686.932.3161428810785
20590095.09.4736846.932.3161429024115
20690096.09.3750006.932.316142621794
20790096.09.3750006.932.316142411586
20890096.09.3750006.932.316141917787
209900108.08.3333336.932.31614188173
210900107.08.4112156.932.31614203545

Oversampled Dataset

distancetimevelocityink_visco_cpsurface_tension_dyne_cmink _densityline_widthoversprayroughness
199900107.08.4112156.932.31614204573
200900108.08.3333336.932.316141918199
201900108.08.3333336.932.3161421228272
20290093.09.6774196.932.3161432347157
20390093.09.6774196.932.31614305201108
20590095.09.4736846.932.3161429024115
20690096.09.3750006.932.316142621794
20890096.09.3750006.932.316141917787
209900108.08.3333336.932.31614188173
210900107.08.4112156.932.31614203545

Duplicate rows

Original Dataset

distancetimevelocityink_visco_cpsurface_tension_dyne_cmink _densityline_widthoversprayroughness# duplicates
Dataset does not contain duplicate rows.

Oversampled Dataset

distancetimevelocityink_visco_cpsurface_tension_dyne_cmink _densityline_widthoversprayroughness# duplicates
230031.09.6776.330.915172181871173
430032.09.3756.330.915173061021453
530032.09.3756.932.31614183107843
630032.09.3756.932.316142242151283
830034.08.8246.330.9151722627953
930034.08.8246.330.9151732381713
1030034.08.8246.932.316142192131853
1130035.08.5716.932.31614232168993
1230036.08.3336.932.3161414227763
1430037.08.1086.932.316142742831493